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^ Abstract 

Metropolized integrators for ergodic stochastic differential equations 

i 1 (SDE) are proposed which (i) are ergodic with respect to the (known) 

equilibrium distribution of the SDE and (ii) approximate pathwise the 
solutions of the SDE on finite time intervals. Both these properties 
c| are demonstrated in the paper and precise strong error estimates are 

obtained. It is also shown that the Metropolized integrator retains these 
properties even in situations where the drift in the SDE is nonglobally 
'— 1 Lipschitz, and vanilla explicit integrators for SDEs typically become 

^ unstable and fail to be ergodic. 
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1 Introduction 

Purpose of the Paper. This paper considers SDEs whose solution possesses 
a probability transition density that satisfies a detailed balance-type condition 
with respect to a known probability distribution. In this context the paper 
analyzes integrators for such SDEs that 
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(i) are ergodic with respect to the exact equilibrium distribution of the SDE 

on infinite time intervals; and, 

(ii) strongly converge to the solutions of the SDE on finite time intervals. 

Pure sampling methods can accomplish (i), but they typically do not approx- 
imate the solution to the SDE. Integrators for SDEs certainly satisfy (ii), but 
they often are divergent on infinite time intervals or ergodic with respect to 
a different equilibrium distribution [33,31,16,20]. This paper shows that a 
Metropolized integrator can simultaneously accomplish these goals. 

Motivation. This paper is motivated by SDEs that arise in molecular dy- 
namics (MD). Although the numerical analysis of general SDEs is well-studied 
(see, e.g., [31,20]), the relevance of this analysis to SDEs that arise in MD is 
limited. Indeed such SDEs are characterized by complex drift vector fields 
that have limited regularity, and one is typically interested in their solutions 
over long time intervals. Let us elaborate briefly on the consequences of these 
two observations. 

The drift vector field of SDEs that arise in MD involve the gradient of 
an empirically derived potential energy function. Due to singularities in the 
potential energy, this drift vector field possesses limited regularity, e.g., it is 
nonglobally Lipschitz. The drift vector field is also computationally costly to 
evaluate because the potential force involves intricate short and long-range 
interactions between atoms. As a result implicit methods are typically too 
costly in this application area. 

Moreover, the lack of regularity in the drift of SDEs that arise in MD causes 
explicit discretizations to be stochastically unstable in general. This implies 
that, in principle, such schemes cannot be used to sample the equilibrium 
distribution of the SDE. Some approaches to stabilize explicit discretizations 
of SDEs include 

• adaptive time-stepping [13]; 

• multiple time-stepping [35]; 

• method of rejecting exploding trajectories [21]; and 

• using bounded random numbers instead of Gaussian ones. 

These methods are often ergodic but typically introduce discretization errors in 
the sampling (i.e. the equilibrium distribution of the numerical scheme is not 
exactly the same as the distribution of the SDE). A standard way to remove 
these errors is to resort to a Monte-Carlo method designed for sampling, e.g., 
hybrid Monte-Carlo algorithms [6,11,1]. This procedure, however, has the 
effect that the dynamics of the method becomes unrelated to that of the SDE. 
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Indeed, in this context, the main question often becomes how to make the 
sampling as efficient as possible by accelerating the rate of convergence of the 
method. 

Here we show that a Metropolis-adjusted explicit integrator can both be 
ergodic with respect to the known equilibrium distribution of the SDE and 
captures the dynamical behavior of the solutions of this SDE. To establish 
this second property, one needs to estimate the effect on the dynamics of the 
rejections in the Metropolis-Hasting method. The analysis of this effect in 
the context of a Metropolized integrator for SDEs with the special structure 
relevant to MD is one of the main objectives of this paper. 

Organization of the Paper. In §2, the main results of the paper are pre- 
sented without proofs and put into context. In §3, numerical validation of 
the main results is provided. The proofs of the main results are organized 
according to the type of dynamic and discretization considered. In §4, the 
proofs pertaining to a forward Euler-Maruyama discretization of overdamped 
Langevin dynamics and its Metropolis-adjustment can be found. In §5, the 
proofs pertaining to a Stormer-Verlet based discretization of inertial Langevin 
dynamics and its Metropolis-adjustment can be found. We wrap up the paper 
with a conclusion in §6. 

2 Main Results 

2.1 Overdamped Langevin 

In the first part of the paper, we shall focus on overdamped Langevin dynamics 
on a energy landscape defined by a potential energy function U G C°°(lR n , R): 

dY = -VU(Y)dt + ^2(3- 1 dW (2.1) 

Here VU(x) : IR n — > IR n denotes the gradient of the function U, W is a 
standard n-dimensional Wiener process, or Brownian motion, and (3 > is 
a parameter referred to as the inverse temperature. Under certain regularity 
conditions on the potential energy, the solution to (2.1) is geometrically ergodic 
with an invariant probability measure \x that possesses the following density 
ir(x) with respect to Lebesgue measure [8,26,25]: 

tt(x) = Z^expi-fiUix)) (2.2) 

where Z = L n exp(—[3U(x))dx. Next we recall various integration strategies 
for (2.1) and summarize their properties. 
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Forward Euler-Maruyama. Let N and h be given, set T = Nh and 
tk = hk for k = 0, ...,N, and consider the following explicit Euler-Maruyama 
discretization of (2.1): 

X fe+1 = X k - hVU(X k ) + v^^(W(t fc+1 ) - W(t k )) (2.3) 

where « Y(t k ). The iteration rule (2.3) defines a Markov chain that 
possesses a transition kernel with the following smooth, strictly positive prob- 
ability transition density: 

*(.,„) = (*r/r>*)^«p (- l> -' 4 y^ WI ' ) (2.4) 

Hence, the chain is irreducible with respect to Lebesgue measure. To be con- 
sistent with the literature, the discrete-time Markov chain generated by Euler- 
Maruyama is called the unadjusted Langevin algorithm (ULA) [26]. 

If VU is globally Lipschitz and h is small enough, ULA (2.3) can often be 
shown to be: 

• first-order strongly convergent to the solution orbits of (2.1) on finite 
time intervals; 

• geometrically ergodic on infinite-time intervals with respect to an invari- 
ant measure that is a first-order approximant to the invariant measure 
fi of (2.1). 

The second property is typically established using a Talay-Tubaro expansion 
of the global weak error of ULA [33] . 

On the other hand, when VU is nonglobally Lipschitz ULA becomes a 
transient Markov chain for any h > [22, 19]. In fact, all moments of Euler- 
Maruyama are unbounded on long time-intervals for any initial condition x £ 
]R n , i.e., for any integer I > 1 and for any h > 

E x {\X k \ 2e } -> oo as k^oo (2.5) 

where K x denotes the expectation conditional on X = x. See, e.g., [16,32]. 
As is well known in the literature, a Metropolis-Hastings method can stabilize 
ULA. 

Metropolized Forward Euler-Maruyama. A Metropolis-Hastings method 
is a quite general Monte-Carlo method for sampling from a known probabil- 
ity distribution [18,9]. The method generates a Markov chain from a given 
proposal Markov chain as follows. The algorithm computes a proposal move 
according to the proposal chain and accepts this proposal with a probability 



2.1 Overdamped Langevin 5 



that ensures the Metropolized chain is ergodic with respect to the given prob- 
ability distribution. Here we shall focus on the Metropolized Euler-Maruyama 
integrator defined in terms of the equilibrium density ir (2.2) and the transition 
density q h (2.4). 

Let (k ~ U(0, 1) for k — 0, N — 1. Given h and X k the algorithm cal- 
culates a proposal move using the Euler-Maruyama updating scheme in (2.3): 

X* k+1 = X k - hVU(X k ) + y^(W(t k+1 ) - W(t k )) (2.6) 

and accepts this proposal with a probability 

a h (x, y = 1 A — -— -. 2.7 

q h (x,y)ir(x) 

That is, the update is defined as: 

Y ) x l+i if Cfe < oc h (X k , X* k+1 ) 

X k+1 = < . (2.8) 

I X k otherwise 

for k — 0, N — 1. To be consistent with the literature, we will refer to the 
Metropolized Euler-Maruyama integrator as the Metropolis-adjusted Langevin 
algorithm (MALA) [26]. By construction, MALA preserves the invariant mea- 
sure fj, of (2.1). If VU is globally Lipschitz and h small enough, one can 
often show that MALA is geometrically ergodic [26]. See, e.g., Theorem 4.1 
of [26]. However, if VU is nonglobally Lipschitz, MALA is often no longer 
geometrically ergodic. Still, for any g : IR n — ► IR with ^i{g) < oo, 

E fl E x {g(X k )}= [ gd/i, V k G N (2.9) 

where E^E 35 denotes expectation conditioned on the initial distribution being 
the equilibrium distribution of (2.1), i.e., 

E fl E x {g(X k )}= [ E x {g(X k )}fi(dx). 

The identity (2.9) is obviously a significant improvement to (2.5). 

When VU is nonglobally Lipschitz, MALA is not geometrically ergodic 
because ULA is transient (see Theorem 4.2 of [26]). To correct this problem, 
Roberts and Tweedie proposed a modification of MALA which truncates the 
drift in regions where the underlying Euler method can be explosive. Specifi- 
cally the proposal move (2.6) is modified to a MALA algorithm with bounded 
drift: 

Z '+' = Z " ~ h lvl\VU(k)\ + V^WW) - W(t k) ) (2.10) 
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When \VU(Zk)\ < 1/h, (2.10) is the Euler-Maruyama update, and otherwise, 
the proposal drift in (2.10) preserves the direction of the drift in ULA, but 
normalizes the amplitude in all degrees of freedom. The Metropolis-Hastings 
method with this modified proposal move will be referred to as the Metropolis- 
adjusted Langevin truncated algorithm (MALTA) [26]. By comparison to a 
random-walk-based Metropolis algorithm, one can often show that MALTA is 
geometrically ergodic [26,2,12]. While this ad hoc correction to MALA makes 
MALTA geometrically ergodic, the relation of MALTA to the original diffusion 
(2.1) remains to be established. 

Main Result I: Strong Convergence of MALA & MALTA. In the pa- 
per we shall consider the situation where VZ7 is nonglobally Lipschitz. Roughly 
speaking, MALTA corrupts Euler-Maruyama orbit by the random rejections 
in (2.8) with the modified proposal (2.10) and the ad hoc truncation of the 
drift. Yet, we show in the paper that MALTA still approximates pathwise the 
solution of (2.1). The precise statement is: 

Theorem 2.1 (MALTA Strong Accuracy). Assume 4-1- Then for every E > 
and T > 0, there exists h c (E ) > and C(T,E ) > such that for all 
h < h c , for all x : U (x) < Eq, and for all t G [0, T] , 

(E"{|Z Lt/fcJ -Y(t)\ 2 }) 1/2 < C(T,E )h 3 /\ 

Assumption 4.1 can be found in §4.1. This assumption remains valid for 
SDEs of the type (2.1) in which the drift is nonglobally Lipschitz. A key in- 
gredient in the proof is geometric ergodicity of MALTA. This property implies 
bounds on moments of MALTA at finite times. The precise bounds can be 
found in Lemma 4.9. 

Since MALA is not geometrically ergodic when VZ7 is nonglobally Lips- 
chitz, we are unable to obtain such bounds on moments of MALA at finite 
times. However, such bounds can be derived if MALA's initial condition is 
restricted to the equilibrium distribution of (2.1). This is because MALA by 
design preserves this distribution. Hence, for MALA we are able to prove the 
following theorem. We stress that MALTA'S strong accuracy does not require 
this restriction on initial conditions. 

Theorem 2.2 (MALA Strong Accuracy from Equilibrium). Assume 4-1- For 
all T > there exists h c > and C(T) > such that for all positive h < h c 
and for all t G [0, T], 



2.2 Inert ial Langevin 7 



The proofs of Theorems 2.1 and 2.2 as well as the precise statement of 
Assumption 4.1 on which they rely are presented in §4. The proofs build on 
results of global accuracy for numerical methods applied to SDEs with uni- 
formly Lipschitz drift [20] and with one-sided Lipschitz drift [10]. §3 will illus- 
trate these results on a simple but representative example with a nonglobally 
Lipschitz drift. 

2.2 Inert ial Langevin 

Next, we shall consider Langevin equations defined in terms of a Hamiltonian 
function H E C°°(M 2r \M): 

dY = JVH(Y)dt - 1 CVH(Y)dt + ^2 1 (3~ 1 CdW (2.11) 

where the following matrices have been introduced: 



J = 



" I 




"0 0" 


, c = 


-I 




I 



Here W is a standard 2n- dimensional Wiener process, or Brownian motion, 
(3 > is a parameter referred to as the inverse temperature, and 7 > is 
referred to as the friction coefficient. Write Y(t) = (Q(t), P(t)) where Q(t) 
and P{t) represent the instantaneous configuration and momentum of the 
system, respectively. We shall assume: 

H(q,p) = ±p T M- 1 p+U(q), 

where M is a symmetric positive definite mass matrix and U is a potential 
energy function. There are two main differences between the overdamped and 
inertial Langevin dynamics. First, the solution of (2.1) is a reversible stochastic 
process, whereas the solution of (2.11) is not. Second, the diffusion in (2.11) is 
only applied to momentum degrees of freedom, whereas the diffusion in (2.1) 
is applied to all degrees of freedom. The consequences of these differences to 
the discretization of (2.11) and its Metropolis-adjustment will be a main focus 
of §5. 

Despite the degenerate diffusion in (2.11), under certain regularity condi- 
tions on U, the solution to this SDE is geometrically ergodic with respect to 
an invariant probability measure /i with the following density [32]: 

n(q, p) = Z~ 1 exp (-f3H(q, p)) , (2. 12) 

where Z = / R2 „ exp (-(3H(q,p)) dqdp. 
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Geometric Langevin Algorithm. Let N and h be given, set T = Nh and 
tk = hk for k = 0, ...,N. We shall consider an integrator for (2.11) based on 
splitting the Langevin equations into Hamilton's equations for the Hamiltonian 
H: 

dQ = M^Pdt, 

* 2.13 
dP = -VU(Q)dt, 



and Ornstein-Uhlenbeck equations 

= 0, 



-^yM~ l Pdt + J2{3~ 1 -idW. 



(2.14) 



The solution of Hamilton's equations will be approximated by a symplectic 
integrator; to be specific, the discrete Hamiltonian map of a self-adjoint vari- 
ational integrator [15]. While the exact flow will be used for the Ornstein- 
Uhlenbeck equations. These flows will be composed in a Strang-type splitting 
to obtain a pathwise approximant to the solution of inertial Langevin which 
we will refer to as the Geometric Langevin Algorithm (GLA). 

This type of splitting of inertial Langevin equations is quite natural and 
has been recently used in simulations of molecular dynamics (see [37,5]), dis- 
sipative particle dynamics (see [29,28]), and inertial particles (see [23]). This 
paper is geared towards applications in molecular dynamics where Langevin 
integrators (including the ones cited above) have been based on generalizations 
of the widely used Stormer-Verlet integrator. The Stormer-Verlet integrator is 
attractive for molecular dynamics because it is an explicit, symmetric, second- 
order accurate, variational integrator for Hamilton's equations. In molecular 
dynamics it was popularized by Loup Verlet in 1967. Other popular generaliza- 
tions of the Stormer-Verlet integrator to Langevin equations include Brunger- 
Brooks-Karplus (BBK) [4], van Gunsteren and Berendsen (vGB) [36], and the 
Langevin-Impulse (LI) methods [30]. The LI method is also based on a split- 
ting of Langevin equations, but it is different from the splitting considered 
here. The long-time properties of GLA were recently analyzed in the context 
of uniformly Lipschitz potential forces in [3]. 

An example of a GLA that will be analyzed in §5 is the Stormer-Verlet 
based scheme: 

Q k+ i = Q k + hM- l e^ M ^P k - f M-'VUiQ,) 

+hJ2fF^ i tk+h/2 M- 1 e-^ M ' 1 ^ +h / 2 - s UW(s), 
P k+ i = e~^- lh P k - | e -7M- V 2 (vU(Q h ) + VU(Q k+1 )) 1 ' " J 

It is straightforward to show (2.15) possesses a smooth probability transition 
function with respect to Lebesgue measure even though the noise is only ap- 
plied to momenta in (2.14). For globally Lipschitz potential forces, it is possible 
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to show that (2.15) is a first-order strongly accurate integrator for (2.11), and 
can be shown to be geometrically ergodic with respect to an invariant measure 
that is a first-order accurate approximation to the exact invariant measure of 
(2.11) [3]. However, if the potential force is nonglobally Lipschitz, then this 
discretization is plagued with the same transient behavior as forward Euler- 
Maruyama applied to the reversible Langevin diffusion process. As before, a 
Metropolis-Hastings method is proposed to correct the discretization error in 
the invariant measure and stochastically stabilize this discretization. 

MAGLA. Let ( k ~ £7(0, 1) for k = 0, N — 1. To Metropolize GLA (5.5), 
we adopt the method of Metropolizing an inertial Langevin integrator intro- 
duced in [27]. In that paper, the authors present and test this Metropoliza- 
tion technique using a potpourri of integrators including the Ricci-Ciccotti 
algorithm as a proposal move [24]. The method presented in Scemama et 
al. involves the involution (p : M. 2n — > IR 2n : 



This involution is introduced since the stochastic process defined by composing 
the solution of (2.11) with (p is reversible with respect to //. 

The Metropolis-Adjusted Geometric Langevin Algorithm (MAGLA) will 
be based on the probability transition density of GLA which we will refer to 
as qh- An explicit formula for this transition density is derived in the body of 
the paper (See (5.6).). Given h and (Q k ,Pi t ), MAGLA computes a proposal 
move (Q* k+1 , P* k+1 ) according to a step of GLA composed with a momentum 
flip. For example, for the Stormer-Verlet based GLA this proposal move is 
given explicitly by: 



<p(q,p) = (q,-p)- 



Q k + hM- l e-' M ' lh ' 2 P k - f M- 1 VC/(Q fc ) 



Ql 



< 




(2.16) 



P* 



k+l 



e -lM-^hp k 




V 



The algorithm accepts this proposal move with probability: 




(2.17) 



In other words, the MAGLA update is defined as: 





otherwise 



(2.18) 
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for k = 0, N — 1. Observe, when a proposal move is rejected, the momen- 
tum is "flipped" ; and M AGLA involves a modified detailed balance condition 
(2.17) as opposed to the usual detailed balance condition used with MALA 
(2.7). The reason MAGLA uses a modified detailed balance is related to the 
nonreversibility of the solution to (2.11). In particular, the transition density of 
the solution to (2.11) does not satisfy detailed balance, but does satisfy a mod- 
ified detailed balance condition. It is straightforward to show that MAGLA 
preserves /i, and hence, is not transient. In fact, it is often easy to classify 
MAGLA as an ergodic Markov chain even if the potential force is nonglob- 
ally Lipschitz. In the next paragraph, we quantify the strong convergence of 
MAGLA. 

Main Result II: Strong Convergence of MAGLA. It is straightfor- 
ward to show GLA provides a first-order globally accurate integrator for (2.11) 
provided the potential force is globally Lipschitz [3]. As discussed MAGLA 
involves a momentum flip whenever a proposal move is rejected. This momen- 
tum flip was introduced since the stochastic process defined by composing the 
solution of (2.11) with a momentum flip satisfies detailed balance with respect 
to 7T. Despite this momentum flip, MAGLA provides a pathwise approximant 
to (2.11) as summarized by the following theorem. As in Theorem 2.2, we will 
restrict the initial conditions of MAGLA to be distributed according to the 
equilibrium distribution of (2.11). As before this assumption implies bounds 
on relevant moments of MAGLA which follow from the fact that MAGLA 
preserves the measure \x. 

Theorem 2.3 (Stormer-Verlet-Based MAGLA Strong Accuracy from Equilib- 
rium). Assume 4-1 (C) and 4-1 (E) on the potential energy and 5.1 on (2.11). 
Then for every T > 0, there exists h c > and C(T) > such that for all 
h < h c , for all x E R 2n , and for all t G [0, T), 



Assumptions 4.1 (C) and 4.1 (E) on the potential energy hold even if 
potential force is nonglobally Lipschitz. The assumption 5.1 on (2.11) ensures 
that the continuous process itself is globally Lipschitz. We stress that this 
regularity of the continuous process is different from assuming VZ7 is globally 
Lipschitz, and is made for convenience. 

Unlike the rejections in (2.8), a rejection in (2.18) leads to a momentum flip, 
and hence, a loss of local accuracy. Even though, MAGLA still approximates 
pathwise the solution to (2.11) because the probability of these rejections as 
a function of time-step size is small. These issues including a proof of Theo- 
rem 2.3 and the lemmas on which it is based can be found in 55. 
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3 Numerical illustration 




Time 

Figure 3.1: Exploding, Stagnating, and Contracting Orbits. For h = 0.3125, 
(3 = 1.0 and x = 4, this figure illustrates a realization of the solution to (3.1) (cyan), 
the Euler-Maruyama approximation (3.2) (green), and MALA (blue) initiated at the 
edge of Bh and driven by the same realization of the Wiener process W. Observe 
that Euler-Maruyama is explosive while the Metropolized orbit is not. Also observe 
that the position of the exact solution contracts rapidly while the Metropolized orbit 
stagnates before contracting due to proposal moves being rejected. 



3.1 Overdamped Langevin 

Here we illustrate the results above on the simple example of a particle diffusing 
in a quartic potential U(x) = x 4 /4 with inverse temperature (3 > 0. The 
overdamped dynamics of this system is given by: 

dY = -Y 3 dt + ^2{3~ 1 dW, Y(0) = x. (3.1) 

This SDE admits the following unique invariant measure 

fi(dx) = Z~ x exp(— (3x^1 A)dx. 

Let N and h be given. Set T = Nh and tk = hk for k — 0, N. In terms of 
which consider the following forward Euler approximation to (3.1): 

X k+1 = X k - hXl + ^/W~i(W(t k+1 ) - W(t k )), X = x. (3.2) 
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Figure 3.2: Long-Time Behavior of MALA. For h = 0.3125, f3 = 0.01 and 
x = 4.0, a long-time realization of the solution to (3.1) (cyan) and MALA (blue). 
Observe that the solution frequently visits higher energy values that are not reached 
by the Metropolized integrator. But, ergodicity requires that the Metropolized 
integrator visit these higher energy levels and for about the same ratio of time 
spent by the solution, since both chains sample from ir. Hence, the Metropolized 
integrator intermittently stagnates at higher energy values as shown in the figure. 



The drift in (3.2) is destabilizing in the region: 

B h = {x : |1 - hx 2 \ > 1}. 

This property of Euler-Maruyama's discrete drift is the essential reason why 
Euler-Maruyama defines a transient Markov chain. Indeed, using this prop- 
erty (2.5) is proved in Lemma 6.3 of [16]. Despite this shortcoming Euler- 
Maruyama can be used as candidate dynamics in a Metropolis-Hastings method 
designed to sample from /i. 

Figure 3.1 illustrates the difference between a given realization of the exact 
solution to (3.1), the Euler-Maruyama integrator (3.2), and the Metropolized 
integrator when initiated at the edge of and driven by the same realization 
of the Wiener process W . We empasize that in this figure the time-step is 
held fixed and chosen to be large. The figure shows the Euler-Maruyama orbit 
explodes, the solution orbit rapidly contracts to the origin, and the Metropolis- 
Hastings method initially stagnating, but eventually tracking the solution ap- 
parently better over time. We remark that the ability of the Metropolized orbit 
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Figure 3.3: Strong Accuracy of MALA. This loglog plot illustrates that despite 
random rejections of integration steps and the nonglobally Lipschitz nature of the 
drift, MALA remains mean-squared, and hence strongly, convergent to solutions of 
the SDE. The dashed line represents a reference slope of /i 3 / 4 . The solid line repre- 
sents an empirical estimate of the mean-squared order of accuracy of MALA for an 
inverse temperature (3 = 1, time-span of T = 1, and an ensemble of initial conditions 
drawn from the equilibrium distribution of the SDE using inverse transform sam- 
pling. The estimated rate of convergence agrees with that reported in Theorem 2.2. 
A total of 10 6 realizations were used to obtain this estimate. 



to track the solution better over time is not a generic property of Metropolized 
integrators. In this case it is a consequence of (3.1) satisfying a "one force, 
one solution" principle (See [7].). That is, for every realization of the Wiener 
process, the solution possesses a random attractor that the Metropolized orbit 
is drawn to. 

Figure 3.2 shows a longer-time realization of the Metropolized integrator 
and the solution. The solution at this temperature frequently visits higher 
energy values. The proposal moves to higher energy values are less likely to be 
accepted by the Metropolized integrator. At the same time, ergodicity implies 
the Metropolized integrator must visit these higher energy values, and as often 
as the exact solution along this long time-interval, since both chains preserve tt. 
Consequently, the Metropolized integrator intermittently stagnates as shown 
in the figure, and of course, loses pathwise accuracy. 

This observation does not contradict Theorems 2.1 and 2.2. Keep in mind 
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Figure 3.4: Strong Accuracy of MALTA. This loglog plot illustrates that 
despite random rejections of integration steps and the ad hoc truncation of the 
drift, MALTA remains mean-squared, and hence strongly, convergent to solutions 
of the SDE. The dashed line represents a reference slope of /i 3 / 4 . The solid line 
represents an empirical estimate of the mean-squared order of accuracy of MALTA 
for an inverse temperature (3 = 1, time-span of T = 1, and initial condition x = 0.1. 
The estimated rate of convergence agrees with that reported in Theorem 2.1. A 
total of 10 6 realizations were used to obtain this estimate. 



that the time-step size for this particular orbit is held fixed and chosen to 
be large. Such stagnations at high energy become less likely to occur along 
orbits on finite-time intervals as the time-step becomes smaller. Figures 3.4 
and 3.3 confirms this pathwise convergence. In particular, the figures show 
an (D(h 3 / 4 ) rate of strong convergence of the Metropolized integrator for this 
example using MALTA and an out of equilibrium initial condition, and MALA 
with an initial condition restricted to the equilibrium distribution of (3.1). 

3.2 Inert ial Langevin 

Consider a simple particle with Hamiltonian given by: 

H(q,p)=p 2 /2 + U(q) 
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Figure 3.5: Strong Accuracy of MAGLA. This plot illustrates that despite mo- 
mentum flips which occur at every rejection of a proposal move, and the nonglobally 
Lipschitz nature of the drift, MAGLA remains mean-squared, and hence strongly, 
convergent to solutions of the SDE. In this loglog plot, the dashed line represents 
a reference slope of h. The solid line shows a sample average of the mean-squared 
order of accuracy of MAGLA, for T = 1, f3 = 7 = 1 and (qo,po) = (0.1,0.0). The 
estimated rate of convergence is consistent with that reported in Theorem 2.3. A 
total of 10 5 realizations were used to obtain this plot. 



where U(q) = q 4 /4:. The inertial Langevin dynamics of this system is given 
by: 

' dQ = Pdt 
dP = -Q 3 dt - -fPdt + ^/2 1 f3- 1 dW, 

with initial condition (Q(0),P(0)) = (qo,po). This SDE admits the following 
unique invariant measure 



(3.3) 



fx(dqdp) = Z 1 ex.-p(—(3H(q,p))dqdp. 



3.2 Inert ial Langevin 16 



Let N and h be given. Set T = Nh and t k = hk for k = 0, N. In terms of 
which consider the following Stormer-Verlet based GLA discretization of (3.3): 

' Qk+i = Q k + he^ h ' 2 P k -^Xl 

+ V / 2^ ZI 7 Jl k+h/2 e~^ +h / 2 ^dW(s), 
P k+1 = e^ h P k - |e-^/ 2 (vU(Q k ) + VC/(Q fe+ i)) (3 ' 4) 
+ ^/2p z m' L k+h e-^ +h - s ^dW(s). 

with initial condition (Q ,P ) = (qo,po). The inertial Langevin integrator 
(3.4) is plagued with the same transient behavior as forward Euler-Maruyama 
applied to the reversible cubic oscillator Langevin dynamics. This occurs at 
points in phase space where the underlying explicit Stormer-Verlet integrator 
in (3.4) becomes linearly unstable. 

A Metropolis-Hastings method can stochastically stabilize (3.4). At the 
same time, as stated in Theorem 2.3, the method is also pathwise accurate 
with respect to the solution of (3.3) when initiated from equilibrium. This 
accuracy is achieved despite the loss of local accuracy in momentum that 
occurs at every rejection. 

Figure 3.5 is consistent with the strong 0(h) rate of convergence of MAGLA 
for this example. Assumption 5.1 is hard to check for this example, but we em- 
phasize the assumption is not equivalent to the potential force being globally 
Lipschitz. 
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4 Overdamped Langevin 
4.1 Preliminaries 

Structural Assumptions. For a function G G C°°(]R ri ,]R) and an integer 
r > 1, let VG and D r G be the gradient and the rt/i-derivative of G, respec- 
tively. Let | ■ | denote the Euclidean vector norm and || • || the Frobenius norm. 
Let L denote the generator of (2.1) defined for any G 6 C 2 (R n ,IR) as 

LG(x) = -VU(x) ■ VG{x) + /T 1 trace[Z> 2 G(a;)]. (4.1) 

Assumption 4.1. The following structural assumptions will be invoked in this 
paper regarding (2.1): 

A) There exists a real constant K > such that 

U(x) > K\x\, V xe R n . 

For every integer i > 1 there exist real constants 5e > and Mi > such 
that the operator (4-1) applied to the Ith-power of the potential energy 
satisfies: 

L{U(x) e } < -S e U(xY + M e , V xe R n . 

C) There exists a real constant K > such that 

\VU{x) - VU(y)\ < K{U{x) + U(y))\x - y\, V x, y e R n . 

D) There exists a real constant K > such that 

(-VU(x) + VU(y), x - y) < K\x - y\ 2 , V x, y e W 1 . 

E) There exists a real constant K > such that 

\\D 3 U(x)\\ V \\D 2 U(x)\\ V \VU(x)\ < K(l + U(x)), VsgR". 

We stress these structural assumptions hold even if the drift in (2.1) is 
nonglobally Lipschitz. These structural assumptions imply that the Markov 
process defined by the solution to (2.1) possesses a unique invariant probability 
measure explicitly given by: 

fi(dx) := ir(x)dx (4.2) 

where n(x) = Z~ x exp(— f3U{x)) with Z = J Mn exp(—j3U(x))dx. These struc- 
tural assumptions also imply the following estimates on higher moments of 
solutions to (2.1) which we state lemma. 



B) 
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Lemma 4.1 (Estimates on Higher Moments of Solution). Assume 4-1- For 
every integer £ > 1 we have the following estimates on higher moments of the 
solution to (2.1). 

A) Given Mn > and 5i > from assumption 4-1 (B), 

W{U{Y{t)Y) <U{x) e + -^, VsGf", Vt>0. 

Of. 

B) There exists K% > such that 

E x < K £ (l + U(x) e ), Vz6B n , Vt>0. 

C) There exists Ke > such that 

E x ^\Y(t) — x\ 2e ^ < Ki(l + t l U{xf l )t l , VxeR n , Vt>0. 

The proof of this lemma is presented in §7. 

The structural assumptions on (2.1) also imply a Lipschitz condition on its 
solution which we state as another lemma. 

Lemma 4.2 (Regularity of Solutions). Assume 4-1- For s < t, let Y t}S {x) 
denote the evolution operator of the solution to (2.1): with Y s s (x) = x and for 
r < s < t recall the Chapman-Kolmogorov identity Y t}S o Y Syr (x) = Y tjr (x). 
Set 

A = Y s+Ks (x) - Y s+h>s (y) - (x - y), 

For all K > there exists h c > 0, such that for all positive h < h c , for all 
x, y G W 1 , and for all s > 0, 

A) 

E{\Y s+hjS (x) - Y s+hyS (y)\ 2 } <\x- y\ 2 (l + Kh); 

B) 

E{|A| 2 } < Kh\\ + U{xf + U{yf)\x - y\. 
The proof of this lemma is also given below in section 7. 
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Explicit Integrator for the SDE. Recall that the transition kernel for 
ULA (cf. (2.3)) has a density qh(x,y) with respect to the Lebesgue measure 
given by 

,„(*,„) = (*r/r'k)^«p ( Jy ~* + -^ ( * )|2 ) ■ (4.3) 

It is derived by a simple change of variables relating the probability density 
of the Euler-Maruyama update to the probability density of a multivariate 
Gaussian with zero mean and variance 2(3~ 1 h. ULA is irreducible with respect 
to the Lebesgue measure because its transition density is smooth and strictly 
positive. However, as mentioned in the introduction, in Langevin diffusions 
with nonglobally Lipschitz drift, ULA is transient, i.e., stochastically unstable. 
Nevertheless under the assumptions on the drift, it is straightforward to obtain 
the following single-step accuracy for the forward Euler-Maruyama scheme. 

Lemma 4.3 (Local Strong Accuracy of Euler-Maruyama). Assume J^.l. For 
all K > there exists a h c such that for all positive h < h c and for all x 6 W 1 

A) the local mean-squared error of (2.3) satisfies 

W{\X X - Y(h)\ 2 } < K{1 + U(x) 4 )h 3 ; 

B) the local mean deviation of (2.3) satisfies 

\E X {X 1 - Y(h)}\ < K(l + U(x) 2 )h 2 . 



The above lemmas are proven in section 7. 



4.2 MALA and its Properties 

Ergodicity. MALA randomly rejects integration steps produced by Euler- 
Maruyama with a probability designed to ensure the composite chain preserves 
the equilibrium measure of (2.1). Recall that this probability is given by: 

M .,,) = lA *j*ffig> (4.4) 

The off-diagonal transition probability kernel of the composite chain has den- 
sity: 

Ph(x,y) = q h (x,y)a h (x,y). 

The probability of remaining at the same point, or stagnation probability, is 
given by 



r h (x) = 1 - / p h (x,z)dz. 
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In sum, the Metropolis-Hastings transition kernel is 

P h (x, dy) = p h (x, y)dy + r h (x)5 x (dy). (4.5) 

where 5(-) denotes the Dirac-delta measure in IR n . It is straightforward to 
show that this Ph is a well-defined probability transition kernel. Let B(M. n ) 
denote the smallest a- algebra containing all the open subsets of M. n . For all 
x G W 1 and A G B(W 1 ), the fc-step transition probability kernel is iteratively 
defined as 

P k h {x,A)=l P h (x,dy)Pl;- 1 (y,A), (k > 1), 

with P°(x, A) equal to the characteristic function of the subset A of R n , i.e., 
P°(x,A) = X a(x). 

Theorem 4.4 (MALA Ergodicity). Under assumption 4-1 on the potential en- 
ergy, the k-step transition probability of MALA converges to fi in the following 
sense 

sup \P£(x,A) - n(A)\ -> 0, as k -> oo, V xeR n . 

AeB(M n ) 

Proof. The following classification of MALA is quite standard. Since 7r and 
are strictly positive and smooth everywhere, MALA is irreducible with respect 
to Lebesgue measure and aperiodic; see, e.g., Lemma 1.2 of [17] and references 
therein. By the design of the acceptance probability, MALA also preserves the 
probability measure \x. To confirm this statement observe from (4.5) that for 
any A G B(W n ) 

/ fi(dx)P h (x,A) = ( / n(x)p h (x,y)dy + 7i(x)r h (x)xA{x)) dx 
JR" JR n \Ja J 

{%(x)p h (x, y) - ir(y)p h (y, x)) dydx + fx(A) 



A 



However, since ph(x,y) satisfies detailed balance the first term in the above 
vanishes, and hence, 

n(dx)P h (x,A) =y.(A) 



According to Corollary 2 of [34] , a Metropolis-Hastings algorithm that is irre- 
ducible with respect to the same measure it is designed to preserve is positive 
Harris recurrent. Consequently, MALA is irreducible, aperiodic, and positive 
Harris recurrent. According to Proposition 6.3 of [22], these properties are 
equivalent to ergodicity of the chain. ■ 

Remark 4.1. This theorem shows that MALA is ergodic with respect to the 
equilibrium measure of (2.1). However, the effect of rejections on the pathwise 
approximation of the solution remains to be quantified. 
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Equilibrium Strong Accuracy. In order to prove equilibrium strong ac- 
curacy of MALA the following lemmas will be helpful. 

Lemma 4.5 (MALA Stagnation Probability). Assume 4-1- For all integers 
i > 1, there exists an h c > and a constant > 0, such that for all positive 
h < h c and for all x E M. n , 

E x {K(x, X{) - I) 2 "} < K t (l + U{x) u )h u , 

where 

X\ = x- hVU(x) + yjip-^h-q. 

andt] ~ jV(0,l) n . 

Proof. Let R h (x) = {y e R" : a h (x,y) < 1}. Then, 

W{{a h {x,X\)-ir}= [ ( qh ^ X \ n{y) -l) 2 \ h (x,y)dy (4.6) 

Introduce the function GiE^xE^I 

G(x, y) = U{y) - U{x) - l - (VU(y) + VU(x), y - x) 
+^{\VU(y)\ 2 -\VU(x)\ 2 ). 
Using the expressions for and 7r (cf. (4.3) and (4.2), resp.) one can show 

q h (x,y)7r{x) 
Hence, (4.6) can be written as: 
W{{a h (x,X$-l)*} = 

r r. f ( \y-x + hVU( x )\ 2 \ 

{Anp^h)-^ 2 (e-Ww) - i) M e \ ^ Jdy (4.7) 

jRiJx) 



Introduce the map (p : W 1 



ip{£) =x + ^/2hf3- 1 £ + hVU{x). 
Set Rh(x) = cp^ 1 (Rh(x)). A change of variables of (4.7) under ip yields, 
E x {(a h (x,X* 1 )-ir} 

= (27r)~ n/2 / f e -PG( x , x +y/2hp-^+hVU(w)) _ A e -|l€l 2 ^ (4.^ 

jR h (x) ^ ' 
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For h sufficiently small the latter integral can be well-approximated by a 
Taylor expansion as follows. A Taylor expansion of the function G about h = 
yields, 



G(x, x + v / 2hf3- 1 $, + hVU{x)) 



(VU(x), D 2 U(x) ■ £) - %^D 3 U(x) ■ e ) h 3 ' 2 + 0(h 2 ' 



Substitute this expansion into (4.8) to obtain, 



f ( e -/3G(x,x+y/2hl3-^+h\7U(x)) _ ^ 
jR h (x) ^ 

J_ ^ ^ \h" i^p- (VU(x),D 2 U(x) ■ - 2 -^D 3 U(x) ■ e^j + 0(h 3 ^) ) e" 
An application of Laplace method to approximately evaluate integrals yields 

f f e -f3G(x,x+y/2hl3- 1 £+hVU(x)) _ ^ 
JR h (x) ^ 



ifc(as) 

(VU(x), D 2 U(x) ■ £) - 2 I^D 3 U(x) ■ e) e-^ 2 d£ + 0(h 3i+1 ) 
2 '3 J 

The proof is completed by invoking Assumption 4.1 (E). ■ 

Lemma 4.6 (Local Accuracy of MALA). Assume J^.l. For all h > there 
exists a C > such that 

A) the local mean-squared error of MALA satisfies 

E X {\X 1 - Y(h)\ 2 } < C(l + U{xf)h 5/2 , V x e R n ; 

B) the local mean deviation of MALA satisfies 

\E X {X 1 - Y{h)}\ < C(l + U(xf)h 2 , VieR". 

Proof. A) By definition of MALA, 

E X {\X 1 -Y(h)\ 2 } = 

E x {\Xl - Y(h)\ 2 a h (x,Xl) + \x- Y(h)\ 2 (1 - a h {x,X\))} . 
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Hence, 

E^jlXi - Y(h)\ 2 } < 

K x {\Xl - Y(h)\ 2 } + E x {\x - Y(h)\ 2 (l - a h (x, X*))} . (4.9) 

It is clear from this expression that the local mean-squared error of the Metropolis- 
Hastings integrator is due to the local mean-squared error of forward Euler- 
Maruyama and a term due to probable rejections in the Metropolis-Hastings 
step. By the Cauchy-Schwarz inequality this latter term can be bounded as 
follows, 

E°{\x-Y(h)\ 2 (a h (x,Xl)-l)} < 

(E- {\x - Y(h)\*}f 2 • (E" {K(x, XI) - 1) 2 }) 1/2 . 

Lemma 4.1 implies there exists a constant K > such that 

(W{\x - Y(h)\ 4 }) 1/2 < K(l + U(x) 2 )h. 

While Lemma 4.5 implies that there exists a constant K > such that 

(E* {(a h (x, XI) - I) 2 }) 1 ' 2 < K(l + U(x) 2 )h 3 / 2 

Thus, there exists a constant K > such that 

E x {\x- Y(h)\ 2 (l - a h (x,Xl))} < K(l + U^f)^ 2 . 

Observe that (4.9) is dominated by the error incurred when a proposal move is 
rejected, and not the 0(h 3 ) error incurred when a proposal move is accepted 
(cf. Lemma 4.3). Hence, the local mean-squared accuracy of MALA is 0(h 5 ^). 
B) Similar to the proof of part (A) 

E* {X, -Y(h)}= E x {(Xt - Y(h))a h (x, X*) 

+ (x-Y(h))(l-a h (x,Xl))} 

By the triangle inequality 

IE- {X, - Y(h)}\ < \E X {(XI - Y(h))a h (x, X*)}| 

+ \E x {(x-Y(h))(l-a h (x,Xl))}\ 
< |E a! {(X*- Y(h))}\ 

+ \E x {(x-Y(h))(l-a h (x,Xl))}\ 

It is clear from this expression that the local mean deviation of the Metropolis- 
Hastings integrator is due to the local mean-deviation of forward Euler-Maruyama 
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and a term due to probable rejections in the Metropolis-Hastings step. By the 
Cauchy-Schwarz and Jensen inequalities, 

\E X {X 1 - Y(h)}\ 2 < 2 \E X {(X* - Y(h))}\ 2 

+ 2\E x {(x-Y(h))(l-a h (x,Xl))}\ 2 
<2\E x {(Xl-Y(h))}\ 2 

+ 2 (E x {\x - Y(h)\ 4 }) 1/2 ■ (E x {(a h (x,Xt) - 1) 4 }) 1/2 

As in part (A) of this proof, the local mean deviation of forward Euler- 
Maruyama Lemma 4.3 together with Lemma 4.1 and Lemma 4.5 imply that, 

\E X {Xi - Y(h)}\ 2 < K(l + F(xf)h\ 



Lemma 4.7 (Uniform In Time Bound on Error). Assume J^.l. Then for all 
h > and for all t > 0, there exists a real constant C > such that 

E M E*{|X L ^j -Y(t)\ 2 } <C. 

Proof. This bound is a consequence of ergodicity of the solution to (2.1) 
and MALA with respect to the probability measure //. That is, the ergodic 
property of the integrator and the solution of (2.1) imply that 

E M E x {|X Lt//lJ -Y(t)\ 2 } 

< 2E ll E x { |X Lt/fcJ | 2 } + 2E ll E x {\Y(t)\ 2 } 




We are now in measure to prove Theorem 2.2 which we restate. 

Theorem 2.1 (MALA Equilibrium Strong Accuracy). Assume J^.l. For all 

T > there exist h c > and C(T) > 0, such that for all positive h < h c and 
for allt e [0,T], 

(e^E* {\X [t/hi - ^(t)| 2 }) 1/2 < C(T)ti"\ 
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Proof. Discretize the interval [0, T] using N equally spaced points in such 
a fashion that t k = kh for k = 0,...,N with N = \T/h\. Assume h > 
but otherwise arbitrary for now. The goal in this proof is to obtain a global 
error estimate for the mean-squared error of MALA by writing the error at the 
(k + l)th step in terms of the error at the kth step. For clarity of presentation 
we will use a rolling constant K throughout this proof. Set 

e k+1 = E^E* {|X fe+1 - Y tk+u0 (x)\ 2 } 

for k = 0, N — 1. Specifically, the proof shows that there exists a constant 
K > such that 

e k+1 < (1 + Ah)e k + Kh 5 / 2 . (4.10) 

By unraveling this recursive inequality, the order h 3 ^ mean-squared accuracy 
becomes apparent. To do this the expectation of the error at the (k + l)th step 
is conditioned on knowing all events up to the kth step. For this purpose let 
J-'k denote the sigma-algebra of events up to and including the H/i-iteration. 
Also write, 

\X k+1 -Y tk+lfi (x)\ 2 = 

\X k+1 - Y tk+1)tk {X k ) + Y tk+1 ,t k {X k ) - Y tk+lttk (Yt h , {x))\ 

Expanding this expression gives: 

\X k+1 -Y tk+lfi (x)\ 2 = 

\X k+1 - Y tk+utk (X k )\ + \Y tk+utk (X k ) - Y tk+utk (Y tk ,o(x))\ 

+ 2 (X k+1 - Y tk+1)tk (X k ), Y tk+utk (X k ) - Y tk+l!tk (Y tk>0 (x))) (4.11) 

It follows from the local mean-squared accuracy of MALA (cf. Lemma 4.6), 

E{|X fc+1 - Y tk+1 , tk (X k )\ 2 | T k ) <K(1 + U(X k ) 4 ) h 5 ' 2 . 

Since MALA preserves /i by design, and by the law of total expectation, 

^W{^{\X k+l - Y tk+utk (X k )\ 2 | T k }} <K(1 + fi(U*)) h 5 ' 2 

where fi(U 4 ) = L n U^dfi. Similarly, Lemma 4.2 implies that 

E,E X [\Y tk+1 , k (X k ) - Y tk+1 , k (Y tkfi (x))\ 2 } < (1 + Kh)e k 

This leaves the third term in (4.11). Set 

A = Y tk+utk (X k ) - Y tk+utk (Y tkfi (x)) - (X k - Y tk>0 (x)) 
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In terms of which, rewrite the third term in (4.11) as: 

(X k+1 - Y tk+1;tk (X k ), X k - Y tk)0 (x)) + (X k+1 - Y tk+1>tk (X k ), A) . 
Cauchy-Schwarz inequality implies, 

E^E* {<E {X k+1 - Y tk+1>tk (X k ) | F k } ,X k - r tfc>0 (aO)} 
<E^{\E{X k+1 -Y tk+ljtk (X k ) | -F fc }| 2 } 1/2 ef 
It follows from Lemma 4.6 and invariance of /i under MALA that, 

E^E* {|E {X k+1 - Y tk+1>tk (X k ) | ^.}| 2 } 1/2 < K (1 + fi(U 3 )) h 2 

Cauchy-Schwarz inequality also implies, 
E,E X {K{(X k+1 -Y tk+1 , k (X k ),A) | F k }} 

1 /2 

<E^E x {E\\X k+1 -Y tk+utk {X k )\ 2 \T k ]] E^E X {E {|A| 2 \ T k }} 1/2 
< A'(l + /i([/ 2 ))E M E 3; {E{|A| 2 | ^}} 1/2 /i 5 / 4 
However, according to Lemma 4.2, 

E^E X {E {|A| 2 | Fk}} 1 ' 2 < Kh(l + fi(U))el /A 

In sum, there exists a real constant K > such that the following term bounds 
from above the third term in (4.11): 

KhW#* + Kh 2 e]/ 2 . 
Using elementary inequalities it is clear that, 

2h 2 Kel /2 < he k + h^ 2 K 2 

and 

2h g / 4 Kel /4 < h 5 / 2 K 2 + e]! 2 h 2 < h^ 2 K 2 + i (e k h + h^' 2 ) . 
Combining the bounds on the iterated expectations of (4.11) yields (4.10). ■ 

4.3 MALTA and its Properties 

A necessary condition for geometric ergodicity of a Metropolis-Hastings method 
is that the rejection or stagnation probability is bounded away from unity (see 
proposition 5 of [25]). When the drift in (2.3) is nonglobally Lipschitz, ULA 
is transient and MALA's stagnation probability cannot be globally bounded 
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away from unity. Hence, MALA is not geometrically ergodic. MALTA corrects 
this problem by truncating the drift in the candidate dynamics in regions where 
the Euler-Maruyama integrator can be explosive. This modification enables 
a precise control of the moments of the integrator initiated from a nonequi- 
librium initial condition. With this control we are able to prove pathwise 
accuracy of MALTA on finite time-intervals. 

The geometric ergodicity of MALTA for super-exponential target densities 
with non-degenerate level-sets was intuitively clear to the investigators who 
proposed the algorithm in [26]. Subsequently, it was proven in proposition 2.1 
of [2] using techniques to prove geometric ergodicity for random walk Metropo- 
lis candidate dynamics given in [12]. For the convenience of the reader, the 
theorem is restated. 

Theorem 4.8 (MALTA Geometric Ergodicity). Assume J^.l. For every h > 
0, there exists a nonnegative function M : W n — > R, real constant p < 1, 
such that the k-step transition probability Pfi(x, ■) of MALTA converges to fi 
geometrically fast: 

\\P%(x, •) - fj,\\ M < M(x)p k , VfceN, V xeW n , 

where we have introduced a TV norm \\ ■ \\m which is defined for a signed 
measure v as: 

\\u\\ M = sup \u(f)\. 
f- \J\<m 

The main implication of geometric ergodicity for our purpose is the fol- 
lowing crucial estimate on moments of MALTA. The proof of this lemma is a 
straightforward, but tedious consequence of the results in Proposition 2.1 and 
Lemma 6.2 of [2]. 

Lemma 4.9 (Estimates on Higher Moments of MALTA). Assume 4-1- For 
every E > 0, for every £ > 1, there exists a h c > and C(E ) > such that 
for all positive h < h c , for all x : U(x) < E , and for all t > 0, 

sup E x {U(Z im Y} <C(E ). 

h<h c 

We are now in measure to prove Theorem 2.1. 

Theorem 2.2 (MALTA Strong Accuracy). Assume J^.l. Then, for every 
E > and T > 0, there exists a h c (E ) > and a C(T,E Q ) > such 
that for all positive h < h c , for all x : U (x) < E , and for all t G [0, T], 

(E x [\Z m] - Y tfi \ 2 }y /2 < C(T,E )h 3 /\ 
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Proof. The proof of this theorem is similar to the proof of Theorem 2.2 with 
the following main differences. 

• The preservation of the moments of the solution to (2.1) and the Metropolis- 
Hastings method due to ergodicity are replaced by bounds on moments 
implied by Lemmas 4.1 and 4.9. 

• The k + 1-step error conditioned on knowing T k is split into two parts: 



E{\Z k+1 -Y(t k+1 )\ 2 | F k ) < 

E{\Z k+1 - Y{t k+1 )\ 2 | T k k h\VU(Z k )\ < l}P(h\VU(Z k )\ < 1) 
+E{\Z k+1 -Y(t k+1 )\ 2 | T k k h\VU(Z k )\ > 1} P(h\VU(Z k )\ > 1) 

For the first part, the results of Theorem 2.2 apply with the provision 
given in the first difference above. For the second term, Chebyshev's 
inequality implies: 

P{h\VU{Z k )\ > 1) < h 4 E x {\VU(Z k )\ 4 } . 

This inequality, Assumption 4.1 (E), Lemma 4.1, and Lemma 4.9 imply 
that there exists a constant C(Eq) > such that, 

E x {\Z k+1 -Y(t k+1 )\ 2 } P(h\VU(Z k )\ > 1) < h 2 C(E ). 
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5 Inertial Langevin 

Next we shall consider a Metropolized version of a discretization of (2.11) that 
extends variational integrators to inertial Langevin dynamics. We begin by 
introducing the so-called geometric Langevin algorithm and discuss its prop- 
erties. The section then examines the properties of the Metropolis-Hastings 
adjusted geometric Langevin algorithm, including quantifying its accuracy in 
approximating inertial Langevin dynamics. 

The arguments in this section are very closely related to those in §4. How- 
ever, there are two key differences. First, the solution to (2.11) is no longer a 
reversible stochastic process. Yet, the transition kernel of the solution process 
composed with a momentum flip does satisfy detailed balance. Second, the 
diffusion in (2.11) is only applied to momentum and not position. These dif- 
ferences motivate this section on Metropolizing discretizations of nonreversible 
processes. However, we will omit analysis that is redundant. 

5.1 Geometric Langevin Algorithm 

Let iV and h be given, set T = Nh and tk = hk for k — 0, ...,N. We shall 
consider an integrator for (2.11) based on splitting the Langevin equations 
into Hamilton's equations for the Hamiltonian H (2.13) and and Ornstein- 
Uhlenbeck equations (2.14). The solution of Hamilton's equations will be ap- 
proximated by the discrete Hamiltonian map of a variational integrator [15]. 
While the exact flow will be used for the Ornstein-Uhlenbeck equations. These 
flows will be composed in a Strang-type fashion to obtain a pathwise approx- 
imant to Langevin's equations which we will call the Geometric Langevin Al- 
gorithm (GLA). 

Variational Integrators. Let L : IR 2n — > M denote the Lagrangian obtained 
from the Legendre transform of the Hamiltonian H, and given by: 

L(q,v) = ^v T Mv-U(q). 

A variational integrator is defined by a discrete Lagrangian : IR n x M. n x 
]R + — > IR which is an approximation to the so-called exact discrete Lagrangian 
which is defined as: 

L*(q , qi ,h)= / L(Q,Q)dt 
Jo 

where Q(t) solves the Euler-Lagrange equations for the Lagrangian L with 
endpoint conditions Q(0) = q and Q{h) = q 1 . 
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A discrete Lagrangian determines a symplectic integrator as follows. Given 
(QoiPq) e ^ 2n , a variational integrator defines an update (q 1 ,p 1 ) G M 2n by 
the following system of equations: 

p = -D x L d {q Q ,q x ,h)i 
p x = D 2 L d {q Q ,q 1 ,h). 

Denote this map by 8 h : R 2n -> R 2n , i.e., 

Oh- (q ,p ) ^ (9i,Pi), 

where (<7i,Pi) solve (5.1). One can show that 9h preserves the canonical sym- 
plectic form on M. 2n , and hence, is Lebesgue measure preserving [15]. By 
appropriately constructing Ld, the map Oh can define an approximation to the 
flow of Hamilton's equations for the Hamiltonian H (2.13). 
A discrete Lagrangian is self-adjoint if: 

Ld(q ,Qi,h) = L d (q Xl q Q ,h). (5.2) 

Some of the results that follow will be specific to the Stormer-Verlet integrator 
which can be derived from the following discrete Lagrangian: 

LMo, Qi,h) = ^(<h - q ) T M( qi - q ) - ±(U(q ) + Ufa)). (5.3) 
This discrete Lagrangian is clearly self-adjoint. 

Ornstein-Uhlenbeck Equations. The following stochastic evolution map 
^ tk +h,t k ■ K 2n -> R 2n defines the stochastic flow of (2.14): 

ipt k +h,t k '■ 

(q,P) >-> (q,e^ M ~ lh p+ ^/^^ j^ +H e~ lM ~ 1{tk+h ~ s) dW{s)^ . (5.4) 

For the distribution of the solution, the stochastic flow will be denoted simply 
by iph- Let Oh denote the transition probability density of ip tk +h,t k - By a change 
of variables, it's transition density is given explicitly by: 



i exp (p x - e-^ 1 h p Y 1 (p x - e-^- lh Po 



°h{P ,Pl) 



(27r)»/2|det(E h )| 
where 

S ft = /T 1 (Id - exp(-2 7 M" 1 /i)) M. 

Observe that this transition density does not depend on the initial or terminal 
position, since (5.4) fixes position and the Hamiltonian is separable. 
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Strang-type Splitting. GLA is defined as the following Strang-type split- 
ting pathwise approximant to (2.11): 



t k +h,t k +h/2 



Oh ° ll>: 



t k +h/2,t k 



(5.5) 



for k = 0, N — 1. For any A e £>(IR 2n ), the transition probability kernel for 
GLA is given by: 



Qh((q ,P ),A) 



Ofc/a(Po>Po) • Oft/2 (pi, Pi) • K(q ,P5)( d <li, d Pi) d Po d Pi 



Observe that the zero of the Dirac-delta measure is implicitly defined by 

P*o = -DiLdiq^q^h), 
P* = D 2 L d {q Q ,q 1 ,h). 

To make it explicit, we perform a change of variables, 



Qh((q ,P ),A) = / o h/2 (p Q ,p* ) •o h/2 (pl,p 1 ) ■ | det(D 12 L d (q , q 1 ,h))\ 
■ S^ DlL ^ qojqith ) tD2L ^ qo>qijh ))(dpl, dp* 1 )dq 1 dp 1 
From which it follows the transition density qh of GLA is given by: 



Qh((q ,P ),(QuPi)) = \ det ( D i2Ld(q ,qi,h))\ 
■ o h /2(p , -DiL d (q , q 1} h)) ■ o h/2 (D 2 L d (q , q 1 , h),Pi)- 



(5.6) 



Observe that an explicit characterization of qh is available even when the 
variational integrator is implicit. 



Stormer-Verlet Based GLA. In this paragraph a local accuracy result is 
stated for the Stormer-Verlet based GLA (2.15). Let X\ := (Q 1 ,Pi). Recall, 
that given (q ,p Q ) € M. 2n at time zero and h > 0, the algorithm computes Xi 
by the following explicit update rule: 

' Qi = <Zo + hM-'e-^^Po - ^M^VU(q ) 



+ 



^_| e - 7M -V2 (vU(q Q ) + VU(Q 1 )) 

-1 f ft „-< 7A4" _1 C/l-s1 J\Xrf„\ 



The proof of the following lemma is similar to Lemma 4.3, and hence, is 
omitted. 
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Lemma 5.1 (Local Accuracy of Stormer-Verlet-Based GLA). Assume 4-1 (C) 
and 4-1 (E) on the potential energy. For all h > and x = (q ,p ) G M. 2n , 
there exists C(q ,p ) > such that n(C) < oo and 

A) the local mean-squared error of (2.15) satisfies 



As has been demonstrated, the Markov chain defined by sampling (2.15) ev- 
ery time-step possesses a smooth probability transition density with respect to 
Lebesgue measure. For globally Lipschitz potential forces, this discretization 
is a first-order strongly accurate integrator for (2.11). However, if the potential 
force is nonglobally Lipschitz, the Stormer-Verlet based GLA is plagued with 
the same transient behavior as Euler-Maruyama for overdamped Langevin. 
As before, a Metropolis-Hastings method is proposed to stochastically stabi- 
lize this Markov chain. 

5.2 MAGLA and its Properties 

MAGLA is the Metropolis-adjusted GLA. Given h and (Q k ,P k ), MAGLA 
computes a proposal move according to a step of GLA: 




B) the local mean deviation of (2.15) satisfies 



E-{x!-y(/i)}| <c(q , Po )h 3 . 



(Qjfc+l) P t+l) = A k +h,t k +h/2 °0 h O Tpt k +h/2,t k {Q k , P k)- 



MAGLA accepts this proposal move with probability: 



<Xh{(QQ,Po)Mi,Px)) = 1 A 



Qh((qi,Pi), (go, -Po)MQi,Pi) 
Qh((qo,P ), (Qi, -Pi))n(q ,p ) 



Altogether, the MAGLA update is given by: 




otherwise 




for k = 0, iV — 1. We stress the momentum flip in the acceptance prob- 
ability and upon rejection is introduced because inertial Langevin dynamics 
is nonreversible. Yet, the exact transition density of the solution to inertial 
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Langevin composed with a momentum is reversible, i.e., does satisfy detailed 
balance. We will see in the proof of Lemma 5.3 that without this momentum 
flip, pathwise accuracy cannot be achieved with MAGLA. 

MAGLA preserves /i, and hence, is not transient even when its underlying 
variational integrator is explicit. In fact, it is often straightforward to classify 
MAGLA as an ergodic Markov chain even if the potential force is nonglobally 
Lipschitz. However, with an explicit variational integrator MAGLA is often 
no longer geometrically ergodic even with momentum flips. To correct this 
problem, a modification of MAGLA can be implemented where the drift is 
truncated in regions where an explicit discretization of the drift causes high 
rejection rates. This modification which would be analogous to MALTA is not 
implemented here. Instead, we will concentrate on proving pathwise accuracy 
from an initial condition restricted to the equlibrium measure. This restric- 
tion enables us to obtain bounds on relevant moments of the Metropolized 
integrator. To prove pathwise accuracy the following lemmas will be useful. 

The following lemma shows for an arbitrary symmetric variational inte- 
grator, the acceptance probability of MAGLA is related to the energy change 
in its variational integrator. This implies that the acceptance probability of 
MAGLA has mechanical and intrinsic meaning. It also simplifies the subse- 
quent analysis of MAGLA. 

Lemma 5.2 (Acceptance Probability of MAGLA). Let denote the transi- 
tion probability density of GLA. Let La denote the discrete Lagrangian associ- 
ated with the variational integrator 9h- Assume that La is self-adjoint (cf. 5.2). 
Then, the acceptance probability of MAGLA satisfies: 

ah((q ,P ),(qi,Pi)) = 1 Aexp(-/3AE(qr ,g 1 )) , 
where we have introduced: 

AE(q , q x ) = -D 2 L d (q , q 1} hf 'M" 1 D 2 L d (q , q lt h) + U ( 9l ) 

- ^DiL d (q , Ql , hfM^DxLaiqu, q x , h) - U(q Q ) . (5.7) 



Proof. Introduce the abbreviations: DiL d = DiL d (q Q ,q 1 ,h) for i — 1,2, and 
Bh/2 — exp(—'jM~ 1 h/2). Expanding (5.6) yields, 

// w \ \ \^(D l2 L d (q Q ,q l ,h))\ 
qh ((q ,p ), ( 9l , Pl )) = (27r)1det(Sfe/2)| 

exp ({D x L d + B h/2 p ) T M-\ld - B^ 2 )~ 1 (D 1 L d + B h/2 p Q ) 

+ ( Pl - B h/2 D 2 L d ) T M-\ld - Bl /2 )-\ Pl - B h/2 D 2 L d ))) (5.8) 
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The self-adjoint property of the discrete Lagrangian implies that: 

D 2 L d (q , q lt h) = D x L d (q 1} q , h), 
DxL d (q , q 1: h) = D 2 L d (q x , q , h). 



Hence, 
Qh((qi,Pi),(q ,Po)) 



det(D 12 L d (q ,q 1 ,h))\ 



(27r)"|det(S V2 )| 

exp {-^ ((D 2 L d + B h/2Pl fM-\ld - B\ j2 Y\D 2 L d + B h/2Pl ) 



+ (p - B h/2 D x L d ) T M-\ld - B 2 h/2 )-\p - B^DM)) (5.9) 

From (5.8) and (5.9), it follows that 

gft((qi,Pi)»(go»-Po)) = 
Qh((q ,Po),(Qi,-Pi)) 

exp (-^ {D 2 L T d M- l D 2 L d - D 1 L d M~ 1 D 1 L d - p^M l Pl + p T M~ 

And, finally, 

9fe((gi,Pi), (go, -Po)M(gi,Pi)) = 
Qh((q ,Po), (Qi, -Pi))tt((9o.Po)) 



Vo) 



exp ( - > ( l -D 2 L T d M- l D 2 L d + E%0 - i 



D l L^M- 1 D l L d - U(q ] 



The following lemma is analogous to Lemma 4.5, and roughly speaking, 
quantifies how often rejections occur in MAGLA. 

Lemma 5.3 (Stagnation Probability of Stormer-Verlet-Based MAGLA). Con- 
sider the Metropolized Stormer-Verlet based GLA (2.15). Assume 4-1 (C) and 
4-1 (E) on the potential energy. For any integer I > 1, there exists h c > and 
K£ > 0, such that for all positive h < h c and for all x = (q ,p ) G M n , 



where 



E x {(a h ((q ,p ),(Ql,Pl)) - If} < C(q ,p )h ee , 



Ql = q + hM- l e^ M ~ lh ' 2 p Q - ^M- l WU(q Q ) 

+h^/2p z m t t k k+h/2 M- 1 e^ M ' 1 ^ +h ' 2 - s UW{s), 
p* = e -rM-i hpo _ | e -V-V2 (W(q ) + VU{QD) 
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Proof. From Lemma 5.2 it is clear that, 
E x {(a h ((q ,p Q ),(Ql,Pl))-ir} = 

/ {esxp(-/3AE(q Q ,q 1 )) A 1 - if q h ((q Q ,p ) } (q 1 ,p 1 ))dq 1 dp l 

Substitute from (5.6) into the above to obtain, 
E*{(a h ((q ,p ),(Ql,Pl))-ir} = 

f | det(D 12 L d (q , qi ,h))\ ■ (exp(-/?A£(q , qO) A 1 - if 

JR 2n 

■ o h/2 (p , -DiL d (q , q Xl h)) ■ o h/2 (D 2 L d (q , q x , h),p 1 )dq 1 dp 1 
Integrate with respect to p x to obtain the following simplified expression, 
E*{(a h ((q ,p ),(QtPt))-l) 2e } = 

[ | det(D 12 L d ) | ■ (exp(-/3A£(g , A 1 - if 

JM. n 

■ o h /z(p , -DxL d (q , q 1} h))dq 1 
Let R h (q ) = { Ql E R n : AE(q , qi ) > 0}. Then, 
E x {(a h {(q ,p Q ),(Ql,Pl))-lf} 

= [ \det(D 12 L d )\- {e~^' I o^-i) 2t 'O h/2 (p ,-D 1 L d (q ,q 1 ,h))dq 1 

[ \det(D 12 L d )\.(e^ AE ^)-if 
jRh(io) 



(27r)"/ 2 |det(S V2 )| 



-r»iZ < ,( 9o ,g 1 ,ft)-e-T« lh / 2 Po ) T S- / 1 2 (~D 1 L d ( qo ,q 1 ,? i )- e -^ lfe /2 po 



(5.10) 



Let Ah/ 2 be the decomposition matrix arising from the Cholesky factorization 
of Eft/2, i.e., A h / 2 A^ 2 = Eft/ 2 . Introduce the map </? : lR n — > lR n , with 
qr x = (p(rj), and defined implicitly by the following relation: 

- £>iL d (g 0> qr 1; fc) = exp^M" 1 fc/2)p + Aft/ary. (5.11) 

Set Rh(q ) = <^ _1 (-Rft(g ))- A change of variables of (5.10) under the map ip 
yields, 

E x {(a h ((q ,p ),(Ql,Pl))-lf} = 

(2tt)-"/ 2 / ( e -/3AB(g .^W) _ l) 2 V^d77 (5.12) 
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Up to this point the argument has been for a general GLA. Now, the 
proof is specialized to the Stormer-Verlet based GLA. For the Stormer-Verlet 
discrete Lagrangian (5.3): 



D 1 L d (q ,q 1 ,h) = -M^-^- - ^VU{q ), 



D 2 L d (q , q lt h) = M^-^ - ^VUfa). 
Substitute these expressions into (5.7) gives: 

AE(q , q x ) = U{q x ) - U(q ) - \ (Vtffo) + V[% ), <Zi - q ) 

+ y (VU{ qi ) T M- l VU{ qi ) - VU{q ) T M~ l VU{q Q )) (5.13) 

Substitute the change of variables implicitly defined by (5.11) into (5.13) and 
then Taylor expand AE about h = to obtain, 

AE fq , q + hM^e'^'^p^ + hM~ l A h/2V - ^M^W^)) = 

- ^D 3 U(q ) ■ (M-Vo) 3 + ^D 2 U(q ) ■ (JIT^, M^VU^)) 

+ 0{h 7 / 2 ) 

As in Lemma 4.5, a standard application of Laplace's method together with 
Assumption 4.1 (E) gives the desired result. ■ 

Using Lemma 5.3, we are now in position to quantify the local accuracy of 
MAGLA. 

Lemma 5.4 (Local Accuracy of Stormer-Verlet-Based MAGLA). Assume 4-1 
(C) and 4-1 (E) on the potential energy. For all h > and x = (q ,p ) G M. n , 
there exists C(q ,p ) > such that fi(C) < oo and 

A) the local mean-squared error of MAGLA satisfies 

(E*{\X 1 -Y(h)\ 2 }f 2 <C(q ,p )hV 2 ; 

B) the local mean deviation of MAGLA satisfies 

\W{X x -Y{h)}\<C{q Q) p Q )h\ 



5.2 MAGLA and its Properties 



37 



Remark 5.1. A detailed proof is omitted since similar steps are taken in the 
proof of Lemma 4-6. We remark that it follows from the definition of MAGLA 
(2.18), 

E x {\Xi - Y(h)\ 2 } < E x {\Xl - Y(h)\ 2 } 

+ E x {\(f(q ,p ) - Y(h)\ 2 (1 - a h ((q ,p ), (Q*. Pi)))} 

This estimate reveals that the local mean-squared error of MAGLA is a sum 
of the local mean-squared error of GLA ( cf. Lemma 5.1) and the error that 
arises from a potential rejection weighted by the probability of rejection. This 
error is 0(1) because when a rejection happens in (2.18), the momentum is 
flipped. However, the probability of rejection is within the order of accuracy of 
the method according to Lemma 5.3. 

Assumption 5.1 (Globally Lipschitz Continuous Process). For s < t, let 
Y t , s (x) denote the evolution operator of the solution to (2.11): with Y S)S (x) = 
x and for r < s <t recall the Chapman- Kolmogorov identity Y t , s ° Y Sir (x) = 
Y t>r (x). Set 

A = Y s+h>s (x) - Y s+h)S (y) - (x - y), 

For every K > there exists a h c > 0, such that for all positive h < h c , for 
all x,y G M. n , and for all s > 0, 

A) 

E {\Y s+hjS (x) -Y s+h , s (y)\ 2 } < \x-y\ 2 (l + Kh); 

B) 

E{|A| 2 } < Kb 2 \x-y\ . 

As discussed MAGLA involves a momentum flip whenever a proposal move 
is rejected. Despite this MAGLA provides a pathwise approximant to (2.11) 
as summarized by the following theorem. 

Theorem 2.3 (Stormer-Verlet-Based MAGLA Strong Accuracy from Equi- 
librium). Assume \.l (C) and 4-1 (E) on the potential energy, and 5.1 on 
(2.11). Then for every T > 0, there exists h c > and C{T) > such that for 
all h < h c , for all x e R 2n , and for all t 6 [0, T), 

(E^E x {\X [t/hl -Y(t)\ 2 }y /2 < C(T)h. 

Remark 5.2. A detailed proof is omitted as it is similar to the proof of The- 
orem 2.2. Recall, the main tools needed for that proof are local accuracy of 
MAGLA which follows from Lemma 5.4, and the invariance of fi under both 
the transition kernel of the solution to (2.11) and the Markov chain defined by 
Stormer-Verlet-based MAGLA. 
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6 Conclusion 

This paper examined the pathwise accuracy of Metropolized integrators ap- 
plied to overdamped and inertial Langevin dynamics. The paper primarily 
analyzed explicit discretizations of these ergodic SDEs. For explicit discretiza- 
tions if the drift is nonglobally Lipschitz, discretization errors often lead to 
stochastically unstable Markov chains. A Metropolis-Hastings method was 
shown to stochastically stabilize these discretizations. The paper used this 
stochastic stability to quantify the error induced by the Metropolis-Hastings 
methods in approximating the solution to the SDE. 

Although the paper restricted to specific discretizations of these SDEs as 
proposal moves, the strategy to prove the results is rather general. The strat- 
egy can be used to prove pathwise accuracy of Metropolized versions of higher 
order accurate, adaptive and multistep discretizations of overdamped or iner- 
tial Langevin dynamics (See, e.g., [20,37].). In fact, this strategy applies to 
a larger class of SDEs, namely those which possess a transition density that 
satisfies detailed balance-type condition. 

Our strategy relies on two main ingredients. First, that the local error 
induced by the Monte-Carlo method depends upon the time-step size in an 
algebraic fashion. For example, often we found that the local mean-squared 
error of the Metropolized Integrator can be decomposed as: 

E x {\X 1 -Y(h)\ 2 } = 

E x {\Xl - Y(h)\ 2 a h (x, XI)} + E'{\(p(x) - Y(h)\ 2 (1 - a h (x, X*))} 

V v ' v ' 

Accepted Proposal Move Rejected Proposal Move 

where Xi is a single step of the Metropolis-adjusted discretization, Y(h) is 
the exact solution of the SDE after a single step, X\ is the proposal move 
step (a single step of the unadjusted discretization), is the probability of 
accepting a proposal move, and tp is a deterministic map applied if a pro- 
posal move is rejected. The first term is bounded by E a! {|Xj — Y(h)\ 2 }, 
i.e., the local order of accuracy of the discretization. The second term is 
bounded by using the Cauchy-Schwarz inequality, together with bounds on 
E*{\ip{x) -Y{h)\ A } 1 ' 2 and E X {{1 - a h {x, X*)) 2 } 1 / 2 . For MALA <p was the 
identity, so that a rejected proposal was nearby the solution. To be precise, 
E x {\p(x) - Y{h)\ A } 1 / 2 < 0(h). The likelihood of a rejection to occur turns 
out to be E X {(1 — ah{x, XI)) 2 } 1 / 2 < 0(h 3 / 2 ). Since the local mean-squared 
error of the unadjusted discretization was 0(h 3 ^ 2 ), the local mean-squared er- 
ror of MALA was dominated by the error incurred by rejected proposal moves 
which was 0(/i 5/ ' 4 ). 

For MAGLA ip was not the identity, but involved a momentum flip. This 
momentum flip was imperative because the exact transition density of iner- 
tial Langevin dynamics does not satisfy detailed balance, but a modified de- 
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tailed balance condition that is derived from composing the solution of inertial 
Langevin with a momentum flip. Consequently, a rejected proposal move for 
MAGLA loses local accuracy, i.e., E x {\cp(x) - Y^h)^} 1 ' 2 ~ 0(1). Neverthe- 
less, the probability of rejection turns out to be within the order of accuracy 
of the SV-based GLA, i.e., - a h (x, X*)) 2 } 1 / 2 < 0(h 3 ). Thus, we found 

the error incurred by a rejected proposal move to be of the same order as 
that of an accepted proposal move. Hence, the local mean-squared accuracy 
of MAGLA was 0(h 3 ^ 2 ). 

The second ingredient involves bounds on moments of the integrator that 
are uniform in the time-step size. These bounds enabled the local error esti- 
mates to be extended to global error estimates, and hence, pathwise conver- 
gence on finite time-intervals. These bounds are automatic if initial conditions 
are restricted to the known equilibrium distribution the Metropolis- adjusted 
integrator is designed to sample. To relax this restriction on initial conditions, 
geometric ergodicity played a key role to obtain such bounds. MALA is often 
ergodic, but because its proposal chain is often transient, is not geometrically 
ergodic. This transience is due to a numerical instability in forward Euler for 
large energy values. By suitably truncating the drift at high energy values, 
one can ensure the proposal dynamics is not transient. MALA with trun- 
cated drift, or MALTA is known to often be geometrically ergodic on infinite 
time- intervals where MALA is not. The paper used this property to prove 
MALTA is pathwise convergent on finite time-intervals without a restriction 
on its initial condition. We remark that MALTA can be generalized to explicit 
higher-order, multistep and adaptive integrators for overdamped and inertial 
Langevin equations. 

Finally, we emphasize that GLA can be extended to non-flat configuration 
spaces and multiple time-steps. Indeed, its underlying variational integrator 
can incorporate holonomic constraints (via, e.g., Lagrange multipliers) [38] and 
multiple time steps to obtain so-called asynchronous variational integrators 
[14]. A natural choice for the variational integrator would be a SHAKE or 
RATTLE algorithm [38, 15]. The Ornstein-Uhlenbeck equations (2.14) are 
defined on a flat vector space since the configuration is held fixed. Moreover, 
a review of the proof of Lemma 5.2 shows that the probability transition 
density of GLA can be explicitly characterized even if the configuration space 
is not flat. By inspection, this density is absolutely continuous with respect to 
the standard volume measure on that non-flat space. Consequently, GLA on 
manifolds can be used as proposal dynamics in a Metropolis-Hastings context, 
and MAGLA can be extended to non-flat configuration spaces to treat, e.g., 
inertial Langevin equations with holonomic constraints as in [37]. Moreover, 
MAGLA on manifolds can readily classified as an ergodic Markov chain on 
that non-flat phase space. 
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7 Proof of Lemmas 4.1, 4.2 and 4.3 

Proof of Lemma 4.1 

Lemma 4.1 A) Let G(x) := U(x) e . By the Taylor-Ito formula 
dG(Y(t)) = L {G(Y(t))} dt + Martingale. 
Assumption 4.1 (B) on the generator of the (SDE) implies 



M, 



E x {G(Y(t))} < e-° et G(x) + -±(1 - e 

Of 



-S f t\ 



From this expression it is clear that for all t > and for all x G W 1 

Me 



E x {G(Y(t))} < G(x) + 

Lemma 4.1 B) Assumption 4.1 (A) implies 

G(x) > K l \x\\ \/ xeW 1 . 

Hence, the upper bound derived in the proof of Lemma 4.1 (A) on 
E x {G(Y(t))} implies an upper bound on E x \\Y(t)f 



Lemma 4.1 C) From the solution to (2.1) it is apparent 



E x {\Y(t) -x\ 2e } = E a 



[ VU(Y(s))ds + ^2f3~ l [ dW 
Jo Jo 







[ dW 




Jo 


I 



The triangle inequality implies that, 
E x {\Y(t) - x\ 2e } <E x ^l VU(Y(s))ds 
Since the function g(x) = x 2£ is strictly convex, 



VW 1 



dW 



2C 







21 






< 2 2e ^E x | 


/ VU{Y{s))ds 


+ (2/T 1 )' 


/ dW 


2^ 




Jo 




Jo 
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Recall, that the higher order central moments of a normal random vari- 
able £ ~ jV(0, t) are given by: 

Since the function = x is convex for positive reals, 









2c 




E x < 




/ dW 




> < n e E x 






Jo 







An application of Cauchy-Schwarz inequality yields, 



VU(Y(s))ds 



21" 



< t E E a 



\VU(Y(s))\ 2 ds 



According to Assumption 4.1 (E), there exists a constant K > such 
that t 

I \VU{Y(s))\ 2 ds<K [ {l + U{Y(s)) 2 )ds. 
Jo Jo 

An application of Holder's inequality and Lemma 4.1 (A) imply there 

exists a possibly different constant K > such that 



VU(Y(s))ds 



2tr 



<Kt M (l + U(x) M ). 



Hence, 



E x {\Y(t) - xf) < 2 2i ~ x [t 2i K(l + U(x) 2e ) + *<M (^jj 



Proof of Lemma 4.2. 

Lemma 4.2 A) By the Taylor- Ito formula, 



s+h,s 
rh 



» - Y s+Ks {y)\ 2 = \x-y\ 



+ 2 / (Y s+riS (x) - Y s+r , s (y), VU(Y s+r , s (y)) - VU(Y s+r , s (x))) dr 
Jo 

According to Assumption 4.1 (D), 

2 f h 2 

\Y S+Ks {x) - Y S+Ks {y)\ 2 < \x - y\ 2 + 2K / \Y s+T)S (x) - Y s+T)S {y)\ 2 dr 

Jo 

Gronwall's lemma implies that, 

\Y s+hyS {x) - Y s+Ks (y)\ 2 <\x- y\ 2 exp(2hK) 



7 Proof of Lemmas 4.1, 4.2 and 4.3 42 



Lemma 4.2 B) To prove the second inequality, observe that: 



IAI 2 



/ (W(y s+r , s (y)) - VU{Y s+r!S {x))) dr 
Jo 



An application of the Cauchy-Schwarz inequality implies, 

|A| 2 < h / \VU(Y s+r , s (y)) - VU(Y s+r , s (x))\ 2 dr 
Jo 

Assumption 4.1 (C) implies, 

|A| 2 < h / ([/(r^,,^)) + U(Y s+r Jx))) \Y s+r Jy) - Y s+r , s (x)\ 2 dr 



Assumption 4. 1 (A) and the triangle inequality imply there exists a con- 
stant K > such that 

|A| 2 < hK / (U(Y s+r ^y)) 2 +U(Y s+ryS (x)) 2 ) \Y a+r>s {y) - Y s+r , s (x)\dr 
Jo 

The Cauchy-Schwarz inequality implies that 

E { | A| 2 } < hK J (E {U{Y s+rtS {y)) 4 } 1/2 + E {U (Y s+r>s (x)) 4 } 1/2 
■E{\Y s+r , s (y)-Y s+r , s (x)\ 2 } 1/2 dr 
Lemma 4.1 implies with a possibly different constant K > 

E{|A| 2 } <hK(l + U(y) 2 + U(xf) [ E {\Y a+r>a (y) - Y s+r , s {x)\ 2 } l/2 dr. 

Jo 

While the first part of this lemma implies, again with a possibly different 
constant K > that, 

E {|A| 2 } < h 2 K(l + U(y) 2 + U(x) 2 ) \x - y| . 
Proof of Lemma 4.3 

Lemma 4.3 A) According to (2.1) and (2.3), the difference between the 
Euler-Maruyama discretization and the solution to the SDE after a single 
step takes the form: 

Xi - Y{h) = - [ (W(cc) - W(Y"(s)) ds 
Jo 
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The Cauchy-Schwarz inequality implies 



X 1 -Y(h) ><h E x {\VU(x) - \7U(Y(s))\ 2 }ds 



E a 



Assumption 4.1 (C) implies that there is a constant K > such that, 



X 1 -Y(h) 



21 <hKE x [ [ (U{x) 2 + U{Y{s)) 2 )\x-Y(s)\ 2 ds 



A second application of Cauchy-Schwarz yields, 
E*{ Xl - Y(h) 2 | < hK ^jf U{x) 2 E x {\x - Y(s)\ 2 } ds 

+ j\ x {U{Y{s)f} l/2 E x {\x - Y{s)f} 1/2 ds^J 
Lemma 4.1 implies that there exists a constant K > such that 



X x -Y(h) \ < Kh 3 (l + U(x) 



E a 



Lemma 4.3 B) The Ito- Taylor formula implies that, 

"h ps 



E x \X 1 -Y(h) 



E a 



o jo 



L{VU(Y{r))}drds 



Applying the Cauchy-Schwarz inequality twice gives, 



E x < Xi — Y(h) 



< h 



E a 



L{VU(Y(r))}dr 



ds 



ph rs 

<h s \E x {L{VU{Y{r))}}\ 2 drds 
Jo Jo 

Jensen's inequality implies that, 

E s {x!-Y(/i)} 2 <li| s E x {\L{VU(Y(r))}\ 2 } drds 
Assumption 4.1 (E) implies the existence of a K > such that 

E x \x x -Y(h)} 2 < hj s K(l + E x {U(Y(r)) 4 })drds 
An application of Lemma 4.1 gives the required result. 
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